139 research outputs found

    A Bayesian mixture modelling approach for spatial proteomics.

    Get PDF
    Analysis of the spatial sub-cellular distribution of proteins is of vital importance to fully understand context specific protein function. Some proteins can be found with a single location within a cell, but up to half of proteins may reside in multiple locations, can dynamically re-localise, or reside within an unknown functional compartment. These considerations lead to uncertainty in associating a protein to a single location. Currently, mass spectrometry (MS) based spatial proteomics relies on supervised machine learning algorithms to assign proteins to sub-cellular locations based on common gradient profiles. However, such methods fail to quantify uncertainty associated with sub-cellular class assignment. Here we reformulate the framework on which we perform statistical analysis. We propose a Bayesian generative classifier based on Gaussian mixture models to assign proteins probabilistically to sub-cellular niches, thus proteins have a probability distribution over sub-cellular locations, with Bayesian computation performed using the expectation-maximisation (EM) algorithm, as well as Markov-chain Monte-Carlo (MCMC). Our methodology allows proteome-wide uncertainty quantification, thus adding a further layer to the analysis of spatial proteomics. Our framework is flexible, allowing many different systems to be analysed and reveals new modelling opportunities for spatial proteomics. We find our methods perform competitively with current state-of-the art machine learning methods, whilst simultaneously providing more information. We highlight several examples where classification based on the support vector machine is unable to make any conclusions, while uncertainty quantification using our approach provides biologically intriguing results. To our knowledge this is the first Bayesian model of MS-based spatial proteomics data.LG was supported by the BBSRC Strategic Longer and Larger grant (Award BB/L002817/1) and the Wellcome Trust Senior Investigator Award 110170/Z/15/Z awarded to KSL. PDWK was supported by the MRC (project reference MC_UP_0801/1). CMM was supported by a Wellcome Trust Technology Development Grant (Grant number 108467/Z/15/Z). OMC is a Wellcome Trust Mathematical Genomics and Medicine student supported financially by the School of Clinical Medicine, University of Cambridge. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    A Bayesian semi-parametric model for thermal proteome profiling.

    Get PDF
    Funder: Wellcome TrustThe thermal stability of proteins can be altered when they interact with small molecules, other biomolecules or are subject to post-translation modifications. Thus monitoring the thermal stability of proteins under various cellular perturbations can provide insights into protein function, as well as potentially determine drug targets and off-targets. Thermal proteome profiling is a highly multiplexed mass-spectrommetry method for monitoring the melting behaviour of thousands of proteins in a single experiment. In essence, thermal proteome profiling assumes that proteins denature upon heating and hence become insoluble. Thus, by tracking the relative solubility of proteins at sequentially increasing temperatures, one can report on the thermal stability of a protein. Standard thermodynamics predicts a sigmoidal relationship between temperature and relative solubility and this is the basis of current robust statistical procedures. However, current methods do not model deviations from this behaviour and they do not quantify uncertainty in the melting profiles. To overcome these challenges, we propose the application of Bayesian functional data analysis tools which allow complex temperature-solubility behaviours. Our methods have improved sensitivity over the state-of-the art, identify new drug-protein associations and have less restrictive assumptions than current approaches. Our methods allows for comprehensive analysis of proteins that deviate from the predicted sigmoid behaviour and we uncover potentially biphasic phenomena with a series of published datasets

    Targeted treatment of yaws with contact tracing : how much do we miss?

    Get PDF
    Yaws is a disabling bacterial infection found primarily in warm and humid tropical areas. The World Health Organization strategy mandates an initial round of total community treatment (TCT) with single-dose azithromycin followed either by further TCT or active case-finding and treatment of cases and their contacts (the Morges strategy). We sought to investigate the effectiveness of the Morges strategy. We employed a stochastic household model to study the transmission of infection using data collected from a pre-TCT survey conducted in the Solomon Islands. We used this model to assess the proportion of asymptomatic infections that occurred in households without active cases. This analysis indicated that targeted treatment of cases and their household contacts would miss a large fraction of asymptomatic infections (65%–100%). This fraction was actually higher at lower prevalences. Even assuming that all active cases and their households were successfully treated, our analysis demonstrated that at all prevalences present in the data set, up to 90% of (active and asymptomatic) infections would not be treated under household-based contact tracing. Mapping was undertaken as part of the study “Epidemiology of Yaws in the Solomon Islands and the Impact of a Trachoma Control Programme,” in September–October 2013

    Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics.

    Get PDF
    The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang & Dunson, 2011) was proposed as a fast method for performing approximate Bayesian inference in DP mixture models, by posing clustering as a Bayesian model selection (BMS) problem and avoiding the use of computationally costly Markov chain Monte Carlo methods. Here we consider how this approach may be extended to permit variable selection for clustering, and also demonstrate the benefits of Bayesian model averaging (BMA) in place of BMS. Through an array of simulation examples and well-studied examples from cancer transcriptomics, we show that our method performs competitively with the current state-of-the-art, while also offering computational benefits. We apply our approach to reverse-phase protein array (RPPA) data from The Cancer Genome Atlas (TCGA) in order to perform a pan-cancer proteomic characterisation of 5157 tumour samples. We have implemented our approach, together with the original SUGS algorithm, in an open-source R package named sugsvarsel, which accelerates analysis by performing intensive computations in C++ and provides automated parallel processing. The R package is freely available from: https://github.com/ococrook/sugsvarsel.Medical Research Council, Funder Id: http://dx.doi.org/10.13039/501100000265, Wellcome Trust Mathematical Genomics and Medicine student supported financially by the School of Clinical Medicine, University of Cambridge. Grant Number: MC_UU_00002/10, MC_UU_00002/13

    Targeted Treatment of Yaws With Household Contact Tracing: How Much Do We Miss?

    Get PDF
    Yaws is a disabling bacterial infection found primarily in warm and humid tropical areas. The World Health Organization strategy mandates an initial round of total community treatment (TCT) with single-dose azithromycin followed either by further TCT or active case-finding and treatment of cases and their contacts (the Morges strategy). We sought to investigate the effectiveness of the Morges strategy. We employed a stochastic household model to study the transmission of infection using data collected from a pre-TCT survey conducted in the Solomon Islands. We used this model to assess the proportion of asymptomatic infections that occurred in households without active cases. This analysis indicated that targeted treatment of cases and their household contacts would miss a large fraction of asymptomatic infections (65%-100%). This fraction was actually higher at lower prevalences. Even assuming that all active cases and their households were successfully treated, our analysis demonstrated that at all prevalences present in the data set, up to 90% of (active and asymptomatic) infections would not be treated under household-based contact tracing. Mapping was undertaken as part of the study "Epidemiology of Yaws in the Solomon Islands and the Impact of a Trachoma Control Programme," in September-October 2013
    • 

    corecore